Bookings(in AUD) Exploration by Heki Wong

Load_the_Data

Univariate Plots Section

Summary

## [1] 89206    30
##  [1] "Pnr.Creat.Date"             "month_pnr_create"          
##  [3] "Rloc"                       "Bkg.User.Id"               
##  [5] "Collect.Method"             "Curr.First.Dep.Apt"        
##  [7] "numberOfDay"                "month_dep"                 
##  [9] "from_1st"                   "to_1st"                    
## [11] "from_2nd"                   "to_2nd"                    
## [13] "from_3rd"                   "to_3rd"                    
## [15] "destination"                "type"                      
## [17] "Season"                     "Curr.First.Dep.Date"       
## [19] "Pnr.Pax.Count"              "Lang"                      
## [21] "Pnr.Curr.Code"              "Pnr.Fare.Amt"              
## [23] "Bkg.Subclass"               "Bkg_Subclass"              
## [25] "Net.Net.Rev"                "Sales.Promo"               
## [27] "first_ticket_O"             "first_ticket_D"            
## [29] "Route"                      "Pnr.Creat.Pseudo.City.Code"
## 'data.frame':    89206 obs. of  30 variables:
##  $ Pnr.Creat.Date            : Factor w/ 365 levels "1/1/15","1/10/15",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ month_pnr_create          : Factor w/ 12 levels ".01.",".02.",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Rloc                      : Factor w/ 88689 levels "2.34E+08","211 (Nov)DG",..: 62064 61972 62040 62067 61906 61917 62199 61948 62019 62068 ...
##  $ Bkg.User.Id               : Factor w/ 20257 levels "1000054769","1000055266",..: 8947 14163 20184 10839 19254 6424 9034 19461 4291 20230 ...
##  $ Collect.Method            : Factor w/ 2 levels "","ET": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Curr.First.Dep.Apt        : Factor w/ 126 levels "ADL","AMS","ARN",..: 92 58 58 92 58 58 58 58 58 10 ...
##  $ numberOfDay               : int  177 85 48 44 10 22 11 109 100 95 ...
##  $ month_dep                 : Factor w/ 12 levels ".01.",".02.",..: 6 3 2 2 1 1 1 4 4 4 ...
##  $ from_1st                  : Factor w/ 66 levels "ADL","AMS","ARN",..: 53 53 53 53 53 53 53 53 1 11 ...
##  $ to_1st                    : Factor w/ 107 levels "ADL","AMS","ARN",..: 68 94 68 68 94 94 94 68 49 49 ...
##  $ from_2nd                  : Factor w/ 88 levels "","AMS","BKK",..: 44 77 44 44 77 77 77 60 44 44 ...
##  $ to_2nd                    : Factor w/ 110 levels "","ADL","AMS",..: 80 52 80 80 52 52 52 52 2 110 ...
##  $ from_3rd                  : Factor w/ 62 levels "","DOH","FCO",..: 25 1 25 25 1 1 1 1 1 1 ...
##  $ to_3rd                    : Factor w/ 99 levels "","ADL","AMS",..: 44 1 44 44 1 1 1 1 1 1 ...
##  $ destination               : Factor w/ 95 levels "ADL","AMS","BKK",..: 43 43 43 43 43 43 43 43 1 95 ...
##  $ type                      : Factor w/ 3 levels "Muli","Oneway",..: 3 3 3 3 3 3 3 3 3 2 ...
##  $ Season                    : Factor w/ 4 levels "Season1","Season2",..: 2 1 1 1 1 1 1 2 2 2 ...
##  $ Curr.First.Dep.Date       : Factor w/ 702 levels "1-Apr-15","1-Apr-16",..: 448 450 212 120 53 352 76 276 46 609 ...
##  $ Pnr.Pax.Count             : int  1 1 1 1 1 1 1 1 1 2 ...
##  $ Lang                      : Factor w/ 14 levels "CN","DE","FR",..: 4 4 4 1 4 4 4 4 4 4 ...
##  $ Pnr.Curr.Code             : Factor w/ 2 levels "","AUD": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Pnr.Fare.Amt              : num  2879 1789 1748 2432 2394 ...
##  $ Bkg.Subclass              : Factor w/ 297 levels "B","B, C","B, E",..: 11 28 28 34 47 47 47 80 80 80 ...
##  $ Bkg_Subclass              : Factor w/ 20 levels "B","C","D","E",..: 1 1 1 1 1 1 1 4 4 4 ...
##  $ Net.Net.Rev               : num  1866 1127 922 1503 1599 ...
##  $ Sales.Promo               : Factor w/ 27 levels "","AUSHKGYOW5 (May)15",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ first_ticket_O            : Factor w/ 66 levels "ADL","AMS","ARN",..: 53 53 53 53 53 53 53 53 1 11 ...
##  $ first_ticket_D            : Factor w/ 107 levels "ADL","AMS","ARN",..: 68 94 68 68 94 94 94 68 49 49 ...
##  $ Route                     : Factor w/ 3738 levels "ADLHKG","ADLHKG, AMSHKG, DOHBCN, HKGADL, HKGDOH",..: 2897 3659 2897 2897 3659 3659 3659 3050 112 1035 ...
##  $ Pnr.Creat.Pseudo.City.Code: Factor w/ 3 levels "SYDCX08AA","SYDCX08CP",..: 1 1 1 1 1 1 1 1 1 1 ...
##  [1] "CN" "DE" "FR" "GB" "ID" "IT" "JP" "KO" "KR" "NL" "RE" "RU" "TH" "TW"
##  [1] "ADL" "AMS" "ARN" "ATH" "BAH" "BCN" "BJS" "BKI" "BKK" "BLR" "BNE"
## [12] "BOM" "BOS" "BUD" "CAN" "CCU" "CDG" "CEB" "CGK" "CGO" "CJU" "CKG"
## [23] "CMB" "CNS" "CNX" "CPH" "CPT" "CRK" "CSX" "CTS" "CTU" "DAC" "DAD"
## [34] "DEL" "DFW" "DLC" "DME" "DOH" "DPS" "DUR" "DUS" "DXB" "EWR" "FCO"
## [45] "FOC" "FRA" "FUK" "HAK" "HAN" "HGH" "HIJ" "HKD" "HKG" "HND" "JFK"
## [56] "KHH" "KUL" "MEL" "MLE" "NGB" "PEK" "PER" "PVG" "SYD" "TPE" "XIY"
##   [1] "ADL" "AMS" "ARN" "ATH" "BAH" "BCN" "BKI" "BKK" "BLR" "BNE" "BOM"
##  [12] "BOS" "BUD" "CAN" "CCU" "CDG" "CEB" "CGK" "CGO" "CJU" "CKG" "CMB"
##  [23] "CNS" "CNX" "CPH" "CRK" "CSX" "CTS" "CTU" "DAC" "DAD" "DEL" "DFW"
##  [34] "DLC" "DME" "DOH" "DPS" "DUS" "DXB" "EWR" "FCO" "FOC" "FRA" "FUK"
##  [45] "GVA" "HAK" "HAN" "HGH" "HKG" "HKT" "HND" "HYD" "ICN" "IST" "JFK"
##  [56] "JNB" "KHH" "KIX" "KMG" "KTM" "KUL" "KWL" "LAX" "LHR" "MAA" "MAD"
##  [67] "MAN" "MEL" "MLE" "MNL" "MXP" "NGB" "NGO" "NKG" "NRT" "OKA" "ORD"
##  [78] "PAR" "PEK" "PEN" "PER" "PNH" "PUS" "PVG" "REP" "RGN" "RMQ" "RUH"
##  [89] "SFO" "SGN" "SHA" "SIN" "SUB" "SYD" "SYX" "TAO" "TPE" "TSN" "USM"
## [100] "VCE" "WNZ" "WUH" "XIY" "XMN" "YVR" "YYZ" "ZRH"
##  [1] "ADL" "AMS" "BKK" "BLR" "BNE" "BOM" "BOS" "CAN" "CCU" "CDG" "CEB"
## [12] "CGK" "CGO" "CJU" "CKG" "CMB" "CNS" "CNX" "CPT" "CRK" "CSX" "CTS"
## [23] "CTU" "DAC" "DAD" "DEL" "DFW" "DLC" "DME" "DOH" "DPS" "DUR" "DUS"
## [34] "DXB" "EWR" "FCO" "FOC" "FRA" "FUK" "HAK" "HAN" "HGH" "HKG" "HKT"
## [45] "HND" "HYD" "ICN" "JFK" "JMU" "JNB" "KHH" "KIX" "KMG" "KTM" "KUL"
## [56] "KWL" "LAX" "LHR" "MAA" "MAD" "MAN" "MEL" "MLE" "MNL" "MXP" "NGB"
## [67] "NGO" "NKG" "NRT" "OKA" "ORD" "PEK" "PER" "PNH" "PUS" "PVG" "RMQ"
## [78] "RUH" "SFO" "SGN" "SHA" "SIN" "SYD" "SYX" "TAO" "TPE" "TSN" "VVO"
## [89] "WNZ" "WUH" "XIY" "XMN" "YVR" "YYZ" "ZRH"
##  [1] "B" "C" "D" "E" "F" "H" "I" "J" "K" "L" "M" "N" "Q" "R" "S" "T" "U"
## [18] "V" "W" "Y"
##   Pnr.Creat.Date  month_pnr_create      Rloc         Bkg.User.Id   
##  15/10/15:  697   .10.   :11437    3WVWN4 :    3   GUEST   :  487  
##  19/2/15 :  633   .02.   :10858    222MBM :    2   GUEST366:   93  
##  16/10/15:  617   .03.   : 9827    224FOR :    2   GUEST765:   87  
##  9/2/15  :  608   .11.   : 7784    22K5RR :    2   GUEST392:   85  
##  19/10/15:  589   .04.   : 7401    23TSE8 :    2   GUEST456:   85  
##  17/2/15 :  567   .09.   : 6800    242SJ6 :    2   GUEST457:   85  
##  (Other) :85495   (Other):35099    (Other):89193   (Other) :88284  
##  Collect.Method Curr.First.Dep.Apt  numberOfDay       month_dep    
##    :    4       HKG    :23418      Min.   :-18.00   .06.   : 8921  
##  ET:89202       SYD    :14016      1st Qu.: 29.00   .05.   : 8794  
##                 MEL    : 9756      Median : 74.00   .11.   : 8117  
##                 PER    : 7809      Mean   : 97.35   .04.   : 8085  
##                 BNE    : 4355      3rd Qu.:144.00   .03.   : 8072  
##                 ADL    : 4004      Max.   :492.00   .12.   : 8061  
##                 (Other):25848                       (Other):39156  
##     from_1st         to_1st         from_2nd         to_2nd     
##  HKG    :57990   HKG    :31132   HKG    :52809   HKG    :31092  
##  BNE    : 8605   SYD    :14836   SYD    :13890   SYD    : 8652  
##  ADL    : 7148   MEL    :13194   MEL    : 8953   BNE    : 6077  
##  CNS    : 1917   PER    : 7634   PER    : 5414          : 5261  
##  SYD    : 1762   PEK    : 2914          : 5261   PER    : 5174  
##  MEL    : 1275   ICN    : 2536   CNS    :  229   ADL    : 4909  
##  (Other):10509   (Other):16960   (Other): 2650   (Other):28041  
##     from_3rd         to_3rd       destination        type      
##         :40307          :40307   HKG    :79862   Muli  :19930  
##  HKG    :19674   HKG    :29050   BNE    : 2164   Oneway: 7771  
##  MEL    : 6228   SYD    : 4661   ADL    : 1684   return:61505  
##  PER    : 3554   MEL    : 2740   CNS    :  542                 
##  SYD    : 2964   PER    : 1624   SYD    :  374                 
##  ICN    : 2363   PVG    : 1363   PEK    :  350                 
##  (Other):14116   (Other): 9461   (Other): 4230                 
##      Season      Curr.First.Dep.Date Pnr.Pax.Count        Lang      
##  Season1:20096   26-Jul-15:  386     Min.   :1.000   GB     :77207  
##  Season2:25800   23-Dec-15:  347     1st Qu.:1.000   CN     : 5760  
##  Season3:20292   23-Nov-15:  343     Median :1.000   TW     : 5179  
##  Season4:23018   31-Jan-16:  341     Mean   :1.373   KO     :  619  
##                  1-Feb-16 :  337     3rd Qu.:2.000   JP     :  220  
##                  2-Aug-15 :  323     Max.   :6.000   DE     :   68  
##                  (Other)  :87129                     (Other):  153  
##  Pnr.Curr.Code  Pnr.Fare.Amt      Bkg.Subclass    Bkg_Subclass  
##     :  320     Min.   :    0.0   V      :29638   V      :29968  
##  AUD:88886     1st Qu.:  938.7   S      : 9908   L      :14389  
##                Median : 1365.3   L, V   : 8802   S      : 9922  
##                Mean   : 1979.2   M, V   : 5715   M      : 9171  
##                3rd Qu.: 2196.4   E      : 3213   K      : 6999  
##                Max.   :33517.4   K, V   : 3149   E      : 5718  
##                                  (Other):28781   (Other):13039  
##   Net.Net.Rev                   Sales.Promo    first_ticket_O 
##  Min.   :   15.93                     :88721   HKG    :57990  
##  1st Qu.:  421.92   RUGBY7AU          :  326   BNE    : 8605  
##  Median :  612.36   AUSHKGYOW5 (May)15:   49   ADL    : 7148  
##  Mean   :  950.15   MICE549           :   27   CNS    : 1917  
##  3rd Qu.: 1009.42   MICE629           :   15   SYD    : 1762  
##  Max.   :20161.28   MICE989           :   15   MEL    : 1275  
##  NA's   :26605      (Other)           :   53   (Other):10509  
##  first_ticket_D             Route       Pnr.Creat.Pseudo.City.Code
##  HKG    :31132   HKGSYD, SYDHKG:11697   SYDCX08AA:88597           
##  SYD    :14836   HKGMEL, MELHKG: 6946   SYDCX08CP:  330           
##  MEL    :13194   HKGPER, PERHKG: 4034   SYDCX08PA:  279           
##  PER    : 7634   BNEHKG, HKGBNE: 2001                             
##  PEK    : 2914   SYDHKG        : 1762                             
##  ICN    : 2536   ADLHKG, HKGADL: 1657                             
##  (Other):16960   (Other)       :61109

Most passengers buy return tickets flight from HKG in english version. For each transaction , the median of the passenger is one. The amount of fare from AUD 0 to AUD 33,517.4 There is 61505 is return ticket. Some of transaction duplicate that the number of RLOC is more than one.

Remove all duplicate row

## [1] 88356    30
##   Pnr.Creat.Date  month_pnr_create          Rloc         Bkg.User.Id   
##  15/10/15:  692   .10.   :11360    2.34E+08   :    1   GUEST   :  486  
##  19/2/15 :  628   .02.   :10755    211 (Nov)DG:    1   GUEST366:   91  
##  16/10/15:  614   .03.   : 9716    22 (Feb)V2 :    1   GUEST765:   87  
##  9/2/15  :  602   .11.   : 7725    2222MP     :    1   GUEST392:   85  
##  19/10/15:  579   .04.   : 7297    2224JW     :    1   GUEST456:   84  
##  18/2/15 :  563   .09.   : 6748    2224UD     :    1   GUEST457:   84  
##  (Other) :84678   (Other):34755    (Other)    :88350   (Other) :87439  
##  Collect.Method Curr.First.Dep.Apt  numberOfDay      month_dep    
##    :    4       HKG    :23142      Min.   :  0.0   .06.   : 8845  
##  ET:88352       SYD    :13935      1st Qu.: 29.0   .05.   : 8725  
##                 MEL    : 9685      Median : 74.0   .03.   : 8039  
##                 PER    : 7779      Mean   : 97.2   .11.   : 8035  
##                 BNE    : 4335      3rd Qu.:144.0   .04.   : 8008  
##                 ADL    : 3968      Max.   :492.0   .12.   : 7967  
##                 (Other):25512                      (Other):38737  
##     from_1st         to_1st         from_2nd         to_2nd     
##  HKG    :57459   HKG    :30813   HKG    :52436   HKG    :30760  
##  BNE    : 8538   SYD    :14728   SYD    :13763   SYD    : 8585  
##  ADL    : 7085   MEL    :13098   MEL    : 8865   BNE    : 6043  
##  CNS    : 1903   PER    : 7580   PER    : 5361   PER    : 5151  
##  SYD    : 1710   PEK    : 2881          : 5117          : 5117  
##  MEL    : 1254   ICN    : 2515   CTU    :  228   ADL    : 4872  
##  (Other):10407   (Other):16741   (Other): 2586   (Other):27828  
##     from_3rd         to_3rd       destination        type      
##         :39775          :39775   HKG    :79140   Muli  :19786  
##  HKG    :19531   HKG    :28876   BNE    : 2145   Oneway: 7578  
##  MEL    : 6194   SYD    : 4628   ADL    : 1673   return:60992  
##  PER    : 3535   MEL    : 2705   CNS    :  536                 
##  SYD    : 2952   PER    : 1619   SYD    :  350                 
##  ICN    : 2353   PVG    : 1356   PEK    :  345                 
##  (Other):14016   (Other): 9397   (Other): 4167                 
##      Season      Curr.First.Dep.Date Pnr.Pax.Count        Lang      
##  Season1:19969   26-Jul-15:  380     Min.   :1.000   GB     :76459  
##  Season2:25578   23-Dec-15:  345     1st Qu.:1.000   CN     : 5705  
##  Season3:20056   23-Nov-15:  342     Median :1.000   TW     : 5139  
##  Season4:22753   31-Jan-16:  341     Mean   :1.366   KO     :  615  
##                  1-Feb-16 :  337     3rd Qu.:2.000   JP     :  217  
##                  2-Aug-15 :  319     Max.   :6.000   DE     :   68  
##                  (Other)  :86292                     (Other):  153  
##  Pnr.Curr.Code  Pnr.Fare.Amt      Bkg.Subclass    Bkg_Subclass  
##     :    0     Min.   :    0.0   V      :29281   V      :29611  
##  AUD:88356     1st Qu.:  939.9   S      : 9827   L      :14271  
##                Median : 1362.7   L, V   : 8735   S      : 9841  
##                Mean   : 1976.5   M, V   : 5665   M      : 9071  
##                3rd Qu.: 2189.9   E      : 3185   K      : 6955  
##                Max.   :33517.4   K, V   : 3132   E      : 5678  
##                                  (Other):28531   (Other):12929  
##   Net.Net.Rev                   Sales.Promo    first_ticket_O 
##  Min.   :   15.93                     :87872   HKG    :57459  
##  1st Qu.:  422.81   RUGBY7AU          :  325   BNE    : 8538  
##  Median :  613.78   AUSHKGYOW5 (May)15:   49   ADL    : 7085  
##  Mean   :  952.77   MICE549           :   27   CNS    : 1903  
##  3rd Qu.: 1012.07   MICE629           :   15   SYD    : 1710  
##  Max.   :20161.28   MICE989           :   15   MEL    : 1254  
##  NA's   :26392      (Other)           :   53   (Other):10407  
##  first_ticket_D             Route       Pnr.Creat.Pseudo.City.Code
##  HKG    :30813   HKGSYD, SYDHKG:11632   SYDCX08AA:87748           
##  SYD    :14728   HKGMEL, MELHKG: 6907   SYDCX08CP:  330           
##  MEL    :13098   HKGPER, PERHKG: 4011   SYDCX08PA:  278           
##  PER    : 7580   BNEHKG, HKGBNE: 1988                             
##  PEK    : 2881   SYDHKG        : 1710                             
##  ICN    : 2515   ADLHKG, HKGADL: 1649                             
##  (Other):16741   (Other)       :60459

Suppose the date of booking is earlier than date of flight and the currency is AUD, i.e.numberOfDay >= 0 and Pnr.Curr.Code==‘AUD’ .Also , there is no duplicated RLOC(duplicated data). 850 rows is removed from the dataset. After cleared data, there is 88356 rows

Check Pnr.Fare.Amt/Net.Net.Rev/Number.of.day

Pnr.Fare.Amt

Most tickets is around AUD 1000

Pnr.Fare.Amt

## 
## FALSE  TRUE 
## 88234   122
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   939.9  1363.0  1976.0  2190.0 33520.0

There were 122 transctions are free of charge in 2015. The mean of each transction is AUD 1976. The median of each transction is AUD 1363.

Net.Net.Rev

Most of the net revenue for each transaction is around AUD 500

Net.Net.Rev

## 
## FALSE 
## 61964
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##    15.93   422.80   613.80   952.80  1012.00 20160.00    26392

There is 26392 which do not have record. The median of the net revenue is AUD 613.8

Number.of.day

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Most passengers will plan the trip around 3 months before.

Number.of.day

## 
## FALSE  TRUE 
## 87807   549
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    29.0    74.0    97.2   144.0   492.0

There is 549 passengers bought the ticket shortly.

I am wonder why there is so many passengers flight from HK and if they flight to Australia.

Origin = HKG

## [1] 57459    30
##   Pnr.Creat.Date  month_pnr_create          Rloc         Bkg.User.Id   
##  15/10/15:  518   .02.   : 7502    211 (Nov)DG:    1   GUEST   :  334  
##  19/2/15 :  472   .10.   : 7219    22 (Feb)V2 :    1   GUEST366:   68  
##  9/2/15  :  446   .03.   : 6259    2224JW     :    1   GUEST001:   58  
##  18/2/15 :  441   .11.   : 5023    2224UD     :    1   GUEST392:   57  
##  16/10/15:  433   .04.   : 4733    2224VP     :    1   GUEST748:   57  
##  17/2/15 :  419   .05.   : 4340    22282J     :    1   GUEST227:   56  
##  (Other) :54730   (Other):22383    (Other)    :57453   (Other) :56829  
##  Collect.Method Curr.First.Dep.Apt  numberOfDay       month_dep    
##    :    4       HKG    :18558      Min.   :  0.00   .05.   : 5656  
##  ET:57455       SYD    :10242      1st Qu.: 29.00   .06.   : 5512  
##                 MEL    : 7116      Median : 71.00   .11.   : 5477  
##                 PER    : 6257      Mean   : 94.87   .04.   : 5368  
##                 PEK    : 2301      3rd Qu.:138.00   .12.   : 5282  
##                 PVG    : 1963      Max.   :491.00   .03.   : 5216  
##                 (Other):11022                       (Other):24948  
##     from_1st         to_1st         from_2nd         to_2nd     
##  HKG    :57459   SYD    :14728   HKG    :28593   HKG    :28564  
##  ADL    :    0   MEL    :13098   SYD    :13760   SYD    : 8324  
##  AMS    :    0   PER    : 7580   MEL    : 8865   PER    : 5067  
##  ARN    :    0   PEK    : 2881   PER    : 5360   MEL    : 3113  
##  ATH    :    0   ICN    : 2515          :  294   TPE    : 2655  
##  BAH    :    0   LHR    : 2496   PEK    :  120   PVG    : 1962  
##  (Other):    0   (Other):14161   (Other):  467   (Other): 7774  
##     from_3rd         to_3rd       destination        type      
##         :28548   HKG    :28687   HKG    :56993   Muli  :  166  
##  MEL    : 6163          :28548   SYD    :  127   Oneway:  300  
##  PER    : 3515   TPE    :  120   MEL    :   70   return:56993  
##  SYD    : 2916   YVR    :   34   ICN    :   63                 
##  ICN    : 2349   SYD    :   22   NRT    :   48                 
##  PEK    : 2294   PER    :    8   PER    :   36                 
##  (Other):11674   (Other):   40   (Other):  122                 
##      Season      Curr.First.Dep.Date Pnr.Pax.Count        Lang      
##  Season1:12542   26-Jul-15:  254     Min.   :1.000   GB     :50412  
##  Season2:16536   23-Nov-15:  250     1st Qu.:1.000   CN     : 3297  
##  Season3:13008   1-Feb-16 :  233     Median :1.000   TW     : 3048  
##  Season4:15373   8-Jun-15 :  233     Mean   :1.341   KO     :  423  
##                  26-Jan-16:  228     3rd Qu.:1.000   JP     :  177  
##                  12-Dec-15:  222     Max.   :6.000   IT     :   30  
##                  (Other)  :56039                     (Other):   72  
##  Pnr.Curr.Code  Pnr.Fare.Amt      Bkg.Subclass    Bkg_Subclass  
##     :    0     Min.   :    0.0   V      :18915   V      :19169  
##  AUD:57459     1st Qu.:  940.9   S      : 7177   L      : 9241  
##                Median : 1347.9   L, V   : 5755   S      : 7188  
##                Mean   : 1923.0   M, V   : 3822   M      : 5917  
##                3rd Qu.: 2137.2   K, V   : 2127   K      : 4573  
##                Max.   :33517.4   L      : 1846   E      : 3565  
##                                  (Other):17817   (Other): 7806  
##   Net.Net.Rev         Sales.Promo    first_ticket_O  first_ticket_D 
##  Min.   :   15.93           :57134   HKG    :57459   SYD    :14728  
##  1st Qu.:  430.92   RUGBY7AU:  227   ADL    :    0   MEL    :13098  
##  Median :  620.50   MICE549 :   26   AMS    :    0   PER    : 7580  
##  Mean   :  942.71   MICE629 :   15   ARN    :    0   PEK    : 2881  
##  3rd Qu.: 1010.91   MICE989 :   13   ATH    :    0   ICN    : 2515  
##  Max.   :17994.10   MICE700 :   10   BAH    :    0   LHR    : 2496  
##  NA's   :16265      (Other) :   34   (Other):    0   (Other):14161  
##                             Route       Pnr.Creat.Pseudo.City.Code
##  HKGSYD, SYDHKG                :11632   SYDCX08AA:56990           
##  HKGMEL, MELHKG                : 6907   SYDCX08CP:  226           
##  HKGPER, PERHKG                : 4011   SYDCX08PA:  243           
##  HKGPEK, HKGSYD, PEKHKG, SYDHKG: 1363                             
##  HKGLHR, HKGSYD, LHRHKG, SYDHKG: 1160                             
##  HKGMEL, HKGPEK, MELHKG, PEKHKG: 1022                             
##  (Other)                       :31364
## 
## .01. .02. .03. .04. .05. .06. .07. .08. .09. .10. .11. .12. 
## 3819 3507 5216 5368 5656 5512 4898 4192 3918 4614 5477 5282
## 
##     B     C     D     E     F     H     I     J     K     L     M     N 
##  1083    93   211  3565     1  2180  1698    15  4573  9241  5917  1913 
##     Q     R     S     T     U     V     W     Y 
##     0   353  7188   155    59 19169    26    19

There is 57459 bookings(65%) paid in AUD flight from HKG. I am so surprise that the popular months is not February and August (the school starting day) and the popular booking class is not business class (for business trip)

Origin = HKG (in month of departure)

The popular city( 1st destination) are PAR(Paris), SHA(Shanghai)and YVR(Vancouver), all of that are commercial city.

create AU table

Origin = HKG , destination = AU

## [1] 35628    30
##      to_1st       Pnr.Creat.Date  month_pnr_create          Rloc      
##  SYD    :14728   15/10/15:  330   .10.   : 4635    211 (Nov)DG:    1  
##  MEL    :13098   19/2/15 :  298   .02.   : 4392    22 (Feb)V2 :    1  
##  PER    : 7580   16/10/15:  265   .03.   : 3511    2224JW     :    1  
##  BNE    :  189   17/2/15 :  264   .11.   : 3117    2224VP     :    1  
##  CNS    :   33   9/2/15  :  264   .09.   : 2888    2228IG     :    1  
##  ADL    :    0   19/10/15:  258   .04.   : 2867    222A5C     :    1  
##  (Other):    0   (Other) :33949   (Other):14218    (Other)    :35622  
##    Bkg.User.Id    Collect.Method Curr.First.Dep.Apt  numberOfDay    
##  GUEST   :  255     :    3       HKG    :15665      Min.   :  0.00  
##  GUEST305:   40   ET:35625       SYD    : 5551      1st Qu.: 28.00  
##  GUEST366:   40                  MEL    : 4351      Median : 65.00  
##  GUEST909:   37                  PER    : 2898      Mean   : 89.83  
##  GUEST546:   36                  TPE    : 1409      3rd Qu.:127.00  
##  GUEST582:   36                  PVG    : 1306      Max.   :491.00  
##  (Other) :35184                  (Other): 4448                      
##    month_dep        from_1st        from_2nd         to_2nd     
##  .11.   : 3734   HKG    :35628   HKG    :11903   HKG    :23472  
##  .03.   : 3350   ADL    :    0   SYD    :11785   TPE    : 2644  
##  .12.   : 3276   AMS    :    0   MEL    : 7082   PVG    : 1940  
##  .04.   : 3268   ARN    :    0   PER    : 4105   PEK    : 1141  
##  .05.   : 3233   ATH    :    0          :  251   XMN    :  953  
##  .06.   : 3147   BAH    :    0   PEK    :   95   WUH    :  772  
##  (Other):15620   (Other):    0   (Other):  407   (Other): 4706  
##     from_3rd         to_3rd       destination        type      
##         :23493          :23493   HKG    :35217   Muli  :  159  
##  MEL    : 5709   HKG    :12101   SYD    :  123   Oneway:  252  
##  PER    : 3216   YVR    :    7   MEL    :   69   return:35217  
##  SYD    : 2527   SYD    :    5   ICN    :   61                 
##  ICN    :   86   TPE    :    5   NRT    :   46                 
##  PEK    :   85   PEK    :    4   PER    :   36                 
##  (Other):  512   (Other):   13   (Other):   76                 
##      Season      Curr.First.Dep.Date Pnr.Pax.Count        Lang      
##  Season1: 8084   23-Nov-15:  178     Min.   :1.000   GB     :31276  
##  Season2: 9648   2-Feb-16 :  177     1st Qu.:1.000   TW     : 2319  
##  Season3: 7888   26-Jul-15:  172     Median :1.000   CN     : 1945  
##  Season4:10008   26-Jan-16:  171     Mean   :1.293   KO     :   37  
##                  12-Dec-15:  165     3rd Qu.:1.000   JP     :   18  
##                  1-Feb-16 :  163     Max.   :6.000   IT     :   13  
##                  (Other)  :34602                     (Other):   20  
##  Pnr.Curr.Code  Pnr.Fare.Amt      Bkg.Subclass    Bkg_Subclass  
##     :    0     Min.   :    0.0   V      :11771   V      :11991  
##  AUD:35628     1st Qu.:  926.7   S      : 4224   L      : 5809  
##                Median : 1289.4   L, V   : 3793   S      : 4228  
##                Mean   : 1775.3   M, V   : 2689   M      : 3744  
##                3rd Qu.: 1944.3   K, V   : 1623   K      : 3191  
##                Max.   :27202.2   L, M   : 1034   E      : 2124  
##                                  (Other):10494   (Other): 4541  
##   Net.Net.Rev         Sales.Promo    first_ticket_O  first_ticket_D 
##  Min.   :   15.93           :35346   HKG    :35628   SYD    :14728  
##  1st Qu.:  440.00   RUGBY7AU:  189   ADL    :    0   MEL    :13098  
##  Median :  604.66   MICE549 :   26   AMS    :    0   PER    : 7580  
##  Mean   :  889.32   MICE629 :   15   ARN    :    0   BNE    :  189  
##  3rd Qu.:  965.54   MICE989 :   12   ATH    :    0   CNS    :   33  
##  Max.   :17947.72   MICE700 :    9   BAH    :    0   ADL    :    0  
##  NA's   :9633       (Other) :   31   (Other):    0   (Other):    0  
##                             Route       Pnr.Creat.Pseudo.City.Code
##  HKGSYD, SYDHKG                :11632   SYDCX08AA:35172           
##  HKGMEL, MELHKG                : 6907   SYDCX08CP:  226           
##  HKGPER, PERHKG                : 4011   SYDCX08PA:  230           
##  HKGMEL, HKGPEK, MELHKG, PEKHKG: 1022                             
##  HKGPER, HKGPVG, PERHKG, PVGHKG:  933                             
##  HKGSYD, HKGTPE, SYDHKG, TPEHKG:  826                             
##  (Other)                       :10297

There is 35628 bookings paid in AUD flight from HKG to AU. It is not surprise that the most popular city are SYD(Sydney) , PER(Perth) and MEL(Melbourne) , are main cities in Australia.

Origin = AU

## [1] 14089    30
##     from_1st     Pnr.Creat.Date  month_pnr_create       Rloc      
##  BNE    :8538   15/10/15:   84   .10.   :1692     2.34E+08:    1  
##  CNS    :1903   26/10/15:   84   .03.   :1580     2228AK  :    1  
##  SYD    :1710   19/10/15:   77   .02.   :1517     222DYF  :    1  
##  MEL    :1254   16/10/15:   75   .11.   :1216     2239AK  :    1  
##  PER    : 684   9/2/15  :   74   .04.   :1158     2239VA  :    1  
##  ADL    :   0   12/2/15 :   72   .05.   :1127     223TYU  :    1  
##  (Other):   0   (Other) :13623   (Other):5799     (Other) :14083  
##    Bkg.User.Id    Collect.Method Curr.First.Dep.Apt  numberOfDay   
##  GUEST   :   80     :    0       BNE    :4212       Min.   :  0.0  
##  GUEST315:   22   ET:14089       HKG    :2208       1st Qu.: 16.0  
##  GUEST515:   21                  SYD    :1743       Median : 51.0  
##  GUEST907:   21                  MEL    :1218       Mean   : 81.6  
##  GUEST097:   20                  CNS    :1203       3rd Qu.:124.0  
##  GUEST583:   20                  PER    : 681       Max.   :469.0  
##  (Other) :13905                  (Other):2824                      
##    month_dep        to_1st         from_2nd        to_2nd    
##  .06.   :1472   HKG    :14089   HKG    :8608   BNE    :6000  
##  .05.   :1411   ADL    :    0          :4412          :4412  
##  .03.   :1405   AMS    :    0   CTU    : 163   CNS    :1075  
##  .04.   :1296   ARN    :    0   HGH    : 152   HKG    :1058  
##  .11.   :1274   ATH    :    0   CDG    : 107   PVG    : 159  
##  .07.   :1178   BAH    :    0   FCO    :  88   LHR    : 152  
##  (Other):6053   (Other):    0   (Other): 559   (Other):1233  
##     from_3rd        to_3rd      destination       type          Season    
##         :8088          :8088   HKG    :9279   Muli  :5971   Season1:3324  
##  HKG    :5947   LHR    : 958   BNE    :2016   Oneway:5791   Season2:4179  
##  PVG    :  16   BNE    : 941   CNS    : 324   return:2327   Season3:3133  
##  PEK    :   8   PVG    : 705   LHR    : 202                 Season4:3453  
##  LHR    :   4   PEK    : 596   CTU    : 169                               
##  MNL    :   4   ICN    : 358   PVG    : 163                               
##  (Other):  22   (Other):2443   (Other):1936                               
##  Curr.First.Dep.Date Pnr.Pax.Count        Lang       Pnr.Curr.Code
##  25-Nov-15:   64     Min.   :1.000   GB     :11486      :    0    
##  12-Nov-15:   61     1st Qu.:1.000   TW     : 1330   AUD:14089    
##  30-Jun-15:   61     Median :1.000   CN     : 1052                
##  14-Nov-15:   57     Mean   :1.348   KO     :  113                
##  21-Jul-15:   55     3rd Qu.:2.000   DE     :   33                
##  26-Jul-15:   54     Max.   :6.000   FR     :   24                
##  (Other)  :13737                     (Other):   51                
##   Pnr.Fare.Amt      Bkg.Subclass   Bkg_Subclass   Net.Net.Rev      
##  Min.   :    0.0   V      :5226   V      :5255   Min.   :   58.54  
##  1st Qu.:  845.2   L, V   : 975   L      :1886   1st Qu.:  389.96  
##  Median : 1251.6   S      : 932   M      :1473   Median :  584.18  
##  Mean   : 1921.8   M      : 764   K      :1080   Mean   :  951.99  
##  3rd Qu.: 2114.7   E      : 751   E      :1075   3rd Qu.:  979.48  
##  Max.   :30368.8   M, V   : 670   S      : 932   Max.   :17846.52  
##                    (Other):4771   (Other):2388   NA's   :3724      
##              Sales.Promo    first_ticket_O first_ticket_D 
##                    :13961   BNE    :8538   HKG    :14089  
##  RUGBY7AU          :   73   CNS    :1903   ADL    :    0  
##  AUSHKGYOW5 (May)15:   46   SYD    :1710   AMS    :    0  
##  MICE700           :    3   MEL    :1254   ARN    :    0  
##  MICE684           :    1   PER    : 684   ATH    :    0  
##  MICE697           :    1   ADL    :   0   BAH    :    0  
##  (Other)           :    4   (Other):   0   (Other):    0  
##                             Route      Pnr.Creat.Pseudo.City.Code
##  BNEHKG, HKGBNE                :1988   SYDCX08AA:13970           
##  SYDHKG                        :1710   SYDCX08CP:   98           
##  MELHKG                        :1253   SYDCX08PA:   21           
##  BNEHKG, HKGBNE, HKGLHR, LHRHKG: 711                             
##  PERHKG                        : 683                             
##  BNEHKG, HKGBNE, HKGPVG, PVGHKG: 601                             
##  (Other)                       :7143

There is only 14089 bookings(around 16%) paid in AUD flight from AU. Most of them is multi-way or one-way tickets. All of these bookings is flight to HKG.

create some variable

I create three variables for finding net revenue/passenger, fare/ passenger, because I am to know this information for each passenger , not for each transaction. i.e. \[netPerPerson = Net.Net.Rev/Pnr.Pax.Count\] \[farePerPerson = Pnr.Fare.Amt/Pnr.Pax.Count\]

There is many missing data in netPerPerson column. It cannot find the the ratio of fare to net revenue.

Remove all NA in Net revenue/Person and create a variable

\[fare_to_net = farePerPerson /netPerPerson\] There is so many data removed.

Remove all NA in Net revenue/Person

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.696   1.883   2.258   2.238 112.600

The ratio is mainly around 2.

Then, I am wonder why the ratio is not in same range. Which factor affect this , date of departure , booking class or type of way.( return-way/ one-way/ multi-way)

fare_to_net, by month of departure

## month_dep: .01.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.660   2.108   3.045   3.835  62.760 
## -------------------------------------------------------- 
## month_dep: .02.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.648   1.915   2.733   3.855  25.390 
## -------------------------------------------------------- 
## month_dep: .03.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.653   1.830   2.118   2.143  45.120 
## -------------------------------------------------------- 
## month_dep: .04.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.640   1.787   2.055   2.057  22.370 
## -------------------------------------------------------- 
## month_dep: .05.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.722   1.921   2.135   2.182  49.830 
## -------------------------------------------------------- 
## month_dep: .06.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.712   1.891   2.096   2.155  24.360 
## -------------------------------------------------------- 
## month_dep: .07.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.643   1.790   2.043   2.062  36.020 
## -------------------------------------------------------- 
## month_dep: .08.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.698   1.859   2.073   2.140  18.630 
## -------------------------------------------------------- 
## month_dep: .09.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.690   1.843   2.066   2.107  22.670 
## -------------------------------------------------------- 
## month_dep: .10.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.706   1.848   2.121   2.102 112.600 
## -------------------------------------------------------- 
## month_dep: .11.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.797   2.074   2.506   3.119  41.280 
## -------------------------------------------------------- 
## month_dep: .12.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.485   3.894   4.321   4.399  35.950

In December , the median of ratio is the highest in a year.

fare_to_net, by booking subclass

## Bkg_Subclass: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.542   1.635   1.882   1.848  10.000 
## -------------------------------------------------------- 
## Bkg_Subclass: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3671  1.3730  1.4620  1.7060  1.5380  5.9860 
## -------------------------------------------------------- 
## Bkg_Subclass: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3012  1.3890  1.4500  1.5830  1.5340 12.7900 
## -------------------------------------------------------- 
## Bkg_Subclass: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.514   1.590   1.818   1.717  17.320 
## -------------------------------------------------------- 
## Bkg_Subclass: F
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.319   1.319   1.319   1.319   1.319   1.319 
## -------------------------------------------------------- 
## Bkg_Subclass: H
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.591   1.692   1.970   1.934  22.670 
## -------------------------------------------------------- 
## Bkg_Subclass: I
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.409   1.475   1.703   1.590  43.020 
## -------------------------------------------------------- 
## Bkg_Subclass: J
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6167  1.3300  1.3900  1.7240  1.5110  7.0410 
## -------------------------------------------------------- 
## Bkg_Subclass: K
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.656   1.783   2.120   2.105  26.620 
## -------------------------------------------------------- 
## Bkg_Subclass: L
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.747   1.955   2.396   2.429  36.020 
## -------------------------------------------------------- 
## Bkg_Subclass: M
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.698   1.864   2.304   2.288 112.600 
## -------------------------------------------------------- 
## Bkg_Subclass: N
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.803   1.921   2.429   2.040  45.120 
## -------------------------------------------------------- 
## Bkg_Subclass: Q
## NULL
## -------------------------------------------------------- 
## Bkg_Subclass: R
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5428  1.4490  1.5040  1.7710  1.5810 41.2800 
## -------------------------------------------------------- 
## Bkg_Subclass: S
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.109   2.195   2.566   2.414  49.830 
## -------------------------------------------------------- 
## Bkg_Subclass: T
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.123   1.671   1.783   2.110   1.917   8.847 
## -------------------------------------------------------- 
## Bkg_Subclass: U
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.710   1.314   1.497   1.886   1.604  13.770 
## -------------------------------------------------------- 
## Bkg_Subclass: V
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.774   1.950   2.352   2.314  35.800 
## -------------------------------------------------------- 
## Bkg_Subclass: W
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.309   1.426   1.455   2.341   1.523  41.280 
## -------------------------------------------------------- 
## Bkg_Subclass: Y
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.239   1.400   1.560   2.414   2.165  12.440

The highest median of ratio is the S class among all booking subclass.

fare_to_net, by type

## type: Muli
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.766   2.026   2.413   2.530  45.120 
## -------------------------------------------------------- 
## type: Oneway
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.620   1.709   2.151   1.861  41.280 
## -------------------------------------------------------- 
## type: return
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.704   1.886   2.231   2.215 112.600

The highest median of ratio is multi-way.

Then , I want to set-up a linear model to predict the net revenue from the price of fare.

Linear_Model

## 
## Call:
## lm(formula = netPerPerson ~ farePerPerson, data = remove_NA_net)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9662.7   -55.0    26.2    88.1  5222.5 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -76.899472   2.174457  -35.37   <2e-16 ***
## farePerPerson   0.578773   0.001155  501.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 345.8 on 61962 degrees of freedom
## Multiple R-squared:  0.8021, Adjusted R-squared:  0.8021 
## F-statistic: 2.511e+05 on 1 and 61962 DF,  p-value: < 2.2e-16

Univariate Analysis

What is the structure of your dataset?

There are 89,206 booking , which pay in AUD, in the dataset with 29 features.

[1] “Pnr.Creat.Date”(the date of create booking) [2] “month_pnr_create”(the month of create booking) [3] “Rloc” ( the reference number of booking)
[4] “Bkg.User.Id” [5] “Collect.Method” (only e-ticket)
[6] “Curr.First.Dep.Apt”
[7] “numberOfDay” ( number of days between date of create booking and date of departure)
[8] “month_dep” (the month of departure)
[9] “from_1st” (the 1st city of departure)
[10] “to_1st” (the 1st city of arrival)
[11] “from_2nd” (the 2nd city of departure)
[12] “to_2nd” (the 2nd city of arrival) [13] “from_3rd” (the 3rd city of departure)
[14] “to_3rd” (the 3rd city of arrival)
[15] “destination”
[16] “type” (one-way/return way/ multi-way)
[17] “Season” (the season of departure
[18] “Curr.First.Dep.Date”
[19] “Pnr.Pax.Count” (the number of passengers and companions in each booking)
[20] “Lang” (language shown on website)
[21] “Pnr.Curr.Code”
[22] “Pnr.Fare.Amt” (the amount of Fare in AUD)
[23] “Bkg.Subclass”
[24] “Bkg_Subclass” (the booking subclass)
[25] “Net.Net.Rev” (the net revenue)
[26] “Sales.Promo”
[27] “first_ticket_O”
[28] “first_ticket_D”
[29] “Route”
[30] “Pnr.Creat.Pseudo.City.Code”

The variables Lang, Destination, and Bkg_Subclass, are ordered factor variables with the following levels.

Lang(the language shown on website): “CN” “DE” “FR” “GB” “ID” “IT” “JP” “KO” “KR” “NL” “RE” “RU” “TH” “TW”

Destination: “ADL” “AMS” “BKK” “BLR” “BNE” “BOM” “BOS” “CAN” “CCU” “CDG” “CEB” “CGK” “CGO” “CJU” “CKG” “CMB” “CNS” “CNX” “CPT” “CRK” “CSX” “CTS” “CTU” “DAC” “DAD” “DEL” “DFW” “DLC” “DME” “DOH” “DPS” “DUR” “DUS” “DXB” “EWR” “FCO” “FOC” “FRA” “FUK” “HAK” “HAN” “HGH” “HKG” “HKT” “HND” “HYD” “ICN” “JFK” “JMU” “JNB” “KHH” “KIX” “KMG” “KTM” “KUL” “KWL” “LAX” “LHR” “MAA” “MAD” “MAN” “MEL” “MLE” “MNL” “MXP” “NGB” “NGO” “NKG” “NRT” “OKA” “ORD” “PEK” “PER” “PNH” “PUS” “PVG” “RMQ” “RUH” “SFO” “SGN” “SHA” “SIN” “SYD” “SYX” “TAO” “TPE” “TSN” “VVO” “WNZ” “WUH” “XIY” “XMN” “YVR” “YYZ” “ZRH”

Bkg_Subclass : (best) —————-> (worst) 1) First Class (Flex): F, 2) Business Class : J, C, D, I 3) Premium Economy : W, R, E 4) Economy Class (Standard/flex) : Y, B, H, K, M, L, V 5) Economy Class(Special) : S, N, Q 6) Economy Class(Supersaver): T,U

Other observations: Most passengers buy return tickets flight from HKG in English version. For each transaction , the median of the number of passenger is one. The amount of fare from AUD 0 to AUD 33,517.4 There is 61505 is return ticket. Some of transaction duplicate that the number of RLOC is more than one.

What is/are the main feature(s) of interest in your dataset?

The main features in the data set are the amount of the fare and net revenue, booking subclass, month of create booking/departure and season . I would like to look into what factors will increase passengers and net renvenues. I suspect if peak season/month and booking subclass are the main factor to predict this data.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Also , departure city (1st , 2nd and 3rd ) , arrival city (1st , 2nd and 3rd ), type of tickets and language shown on website likely contribute to the number of passengers. I think language and type of tickets probably contribute most to the number of passengers after researching information on the dataset.

Did you create any new variables from existing variables in the dataset?

I create three variables for finding net revenue/passenger, fare/ passenger, because I am to know this information for each passenger , not for each transaction. i.e. \[netPerPerson = Net.Net.Rev/Pnr.Pax.Count\] \[farePerPerson = Pnr.Fare.Amt/Pnr.Pax.Count\]

I also create the ratio of fare to net revenue. i.e. \[fare_to_net = farePerPerson /netPerPerson\]

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Suppose the date of booking is earlier than date of flight and the currency is AUD, i.e.numberOfDay >= 0 and Pnr.Curr.Code==‘AUD’ .Also , there is no duplicated RLOC(duplicated data). 850 rows is removed from the dataset. After cleared data, there is 88356 rows

There are too much missing data in the net revenue ( 26605 is NA ). Therefore , only relevant to net revenue will use cleared data set.(remove_NA_net)

Bivariate Plots Section

Correlation_Matrix

##               numberOfDay netPerPerson farePerPerson fare_to_net
## numberOfDay    1.00000000  -0.09984963   -0.04312767  0.19311712
## netPerPerson  -0.09984963   1.00000000    0.89558071 -0.24283984
## farePerPerson -0.04312767   0.89558071    1.00000000  0.02418034
## fare_to_net    0.19311712  -0.24283984    0.02418034  1.00000000

Net revenue/person by booking subclasss

The median of net revenue in economy class is the lowest. The better class, the variance of net revenue is increase.

Booking subclasssn by season

V class is the popular booking class in every seasons. From this graph , it can find that only a small porprotion of passengers is able afford the fare of business class.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

Correlation_Matrix

Date of departure, booking class and type of way are the facter affect the ratio of fare to net revenue.In December , the median of ratio is the highest in a year.This result is same as expected because Christmas is in December and it is a important festival for western family.The highest median of ratio is the S class among all booking subclass. The highest median of ratio is multi-way.Normally, multi-way is the most expensive in ticket type.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

It is interested to observe tht economy class is the most popular and the median of net revenue in economy class is the lowest.

What was the strongest relationship you found?

The price of a tickets is positively and strongly correlated with net revenue.The fare could be used in a model to predict the the net revenue in each booking which show perfect correlation.

Multivariate Analysis

Price_Histograms_by_Qualitative_Factors

These density plots explain the odd trends that were seen in the box plots earlier. There is higher variances in lower fare/passenger with different levels of season, booking class and type.

fare_to_net in Season

## Season: Season1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.653   1.851   2.439   2.658  62.760 
## -------------------------------------------------------- 
## Season: Season2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.689   1.864   2.097   2.146  49.830 
## -------------------------------------------------------- 
## Season: Season3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.674   1.828   2.060   2.105  36.020 
## -------------------------------------------------------- 
## Season: Season4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.770   2.021   2.566   3.245 112.600

Wow! Only season 4 has highest median for the ratio having the largest variation. Season2 and Season 3 have the least variation.

fare_to_net in booking class

## Bkg_Subclass: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.542   1.635   1.882   1.848  10.000 
## -------------------------------------------------------- 
## Bkg_Subclass: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3671  1.3730  1.4620  1.7060  1.5380  5.9860 
## -------------------------------------------------------- 
## Bkg_Subclass: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3012  1.3890  1.4500  1.5830  1.5340 12.7900 
## -------------------------------------------------------- 
## Bkg_Subclass: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.514   1.590   1.818   1.717  17.320 
## -------------------------------------------------------- 
## Bkg_Subclass: F
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.319   1.319   1.319   1.319   1.319   1.319 
## -------------------------------------------------------- 
## Bkg_Subclass: H
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.591   1.692   1.970   1.934  22.670 
## -------------------------------------------------------- 
## Bkg_Subclass: I
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.409   1.475   1.703   1.590  43.020 
## -------------------------------------------------------- 
## Bkg_Subclass: J
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6167  1.3300  1.3900  1.7240  1.5110  7.0410 
## -------------------------------------------------------- 
## Bkg_Subclass: K
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.656   1.783   2.120   2.105  26.620 
## -------------------------------------------------------- 
## Bkg_Subclass: L
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.747   1.955   2.396   2.429  36.020 
## -------------------------------------------------------- 
## Bkg_Subclass: M
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.698   1.864   2.304   2.288 112.600 
## -------------------------------------------------------- 
## Bkg_Subclass: N
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.803   1.921   2.429   2.040  45.120 
## -------------------------------------------------------- 
## Bkg_Subclass: Q
## NULL
## -------------------------------------------------------- 
## Bkg_Subclass: R
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5428  1.4490  1.5040  1.7710  1.5810 41.2800 
## -------------------------------------------------------- 
## Bkg_Subclass: S
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.109   2.195   2.566   2.414  49.830 
## -------------------------------------------------------- 
## Bkg_Subclass: T
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.123   1.671   1.783   2.110   1.917   8.847 
## -------------------------------------------------------- 
## Bkg_Subclass: U
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.710   1.314   1.497   1.886   1.604  13.770 
## -------------------------------------------------------- 
## Bkg_Subclass: V
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.774   1.950   2.352   2.314  35.800 
## -------------------------------------------------------- 
## Bkg_Subclass: W
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.309   1.426   1.455   2.341   1.523  41.280 
## -------------------------------------------------------- 
## Bkg_Subclass: Y
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.239   1.400   1.560   2.414   2.165  12.440

The discount ticket in economy class have the lowest median ratio. This is such an usual trend.

fare_to_net in type

## type: Muli
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.766   2.026   2.413   2.530  45.120 
## -------------------------------------------------------- 
## type: Oneway
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.620   1.709   2.151   1.861  41.280 
## -------------------------------------------------------- 
## type: return
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.704   1.886   2.231   2.215 112.600

I guess the median of ratio in one-way is the largest, it is different to my expect. The multi-way is the highest median among the type having the largest interquatile range.

fare_to_net Hist by type

It looks like return-way ticket is the popular type. Different type has similar range of ratio.

fare_to_net Hist by Season

fare_to_net Hist by Booking Subclass

These plots support the variability and trends that the boxplots showed from before. I am going see which variables correlate with net revenue.

fare_to_net Hist VS number of days

It seems that there is no strong relationship between the ratio of fare to net and the number of days. For season1 , mostly, they plan the trip before 3 months, but for other season , they likely buy it in shortly .There is no clear trend, pattern or observation in booking subclass and type.

netPerPerson VS farePerPerson

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The multi-way is the highest median among the type having the largest interquatile range. Only season 4 has highest median for the ratio having the largest variation. Season2 and Season 3 have the least variation. The discount ticket in economy class have the lowest median ratio. This is such an usual trend. It seems that there is no strong relationship between the ratio of fare to net and the number of days. For season1 , mostly, they plan the trip before 3 months, but for other season , they likely buy it in shortly .There is no clear trend, pattern or observation in booking subclass and type.

Were there any interesting or surprising interactions between features?

The multi-way is the highest median among the type having the largest interquatile range. Only season 4 has highest median for the ratio having the largest variation. Season2 and Season 3 have the least variation. The discount ticket in economy class have the lowest median ratio. This is such an usual trend. ### OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

Final Plots and Summary

Plot One

Description One

Passengers prefer buy lowest price of tickets,however , the median of net revenue is also lowest.The proportion of season2 in almost every booking subclass is the highest, passengers likely go to travel in Season 2.

Plot Two

### Description Two

It find that the relationship between months and price of fare. For the peak periods, the price will be increse with higher variance.The mean of fare is under AUD 1000 in whole year.(except January) and the all distribution of fare is positive skew.

Plot three

### Description Three

This graph shows the relationship between fare and net revenue booking subclass.Obviously ,the price of fare is higher, the net revenue is higher. Also, there are two recognized lines on the graph that means there are at least two formula for calculate fare by net revenue for different situations.

Reflection

The data set contains booking information on almost 90 thousand transactions from around 2015. I started by understanding the individual variables in the data set and created a linear model to predict net revenue of ticket, and then I explored interesting questions and leads as I continued to make observations on plots. Eventully , I explored the number of passengers and the amount of net renvenue/fare per passenger across many variables. At first , I was wonder why so many bookings created on February and October, it is because the tickets in these two months is the cheapest in a year.It is easy to understand that the booking class is mainly class V/L/S/M/K , which concentrate on enconomy class with lowest median of fare.Also, it is strong demand for travel originating from HongKong ,benefited from the weakness of the Euros and the Australian dollars in the first half of 2015.It reflects demand on regional routes is strong, particular in economy class. There was strong economy class demand on long-haul routes.For this, I am strongly recommend increase flights to popular destination over the peak period.Also, using larger aircarft such as Boeing 777-300ER on popular flight a day will increase capacity.